Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora

نویسندگان

  • Lidong Bing
  • Bhuwan Dhingra
  • Kathryn Mazaitis
  • Jong Hyuk Park
  • William W. Cohen
چکیده

We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, wellstructured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Bootstrapping of Relation Triples from the Web, Query Languages over these Noisy Triples, their Semantics, and Query Execution Systems

Information Extraction (IE) is the process of retrieving structured information from unstructured text. IE has traditionally relied on extended human interposition to extract small set of predefined relations from the corpus. Now with Web coming in to picture, methods and goals of IE have taken a slight detour, with increasing focus on following challenges 1. Domain independent/Open Information...

متن کامل

Bootstrapping Chatbots for Novel Domains

We tackle the problem of automatically generating chatbots from Web API specifications using embedded natural language metadata, focusing on the intent classification subtask. One of the main challenges for such a use case comes from the lack of a sufficiently representative training sample for utterance classification, which hinders the traditional supervised model’s ability to generalize to u...

متن کامل

Distant IE by Bootstrapping Using Lists and Document Structure

Distant labeling for information extraction (IE) suffers from noisy training data. We describe a way of reducing the noise associated with distant IE by identifying coupling constraints between potential instance labels. As one example of coupling, items in a list are likely to have the same label. A second example of coupling comes from analysis of document structure: in some corpora, sections...

متن کامل

Filtered Ranking for Bootstrapping in Event Extraction

Several researchers have proposed semi-supervised learning methods for adapting event extraction systems to new event types. This paper investigates two kinds of bootstrapping methods used for event extraction: the document-centric and similarity-centric approaches, and proposes a filtered ranking method that combines the advantages of the two. We use a range of extraction tasks to compare the ...

متن کامل

Semantic Role Labeling

This tutorial will describe semantic role labeling, the assignment of semantic roles to eventuality participants in an attempt to approximate a semantic representation of an utterance. The linguistic background and motivation for the definition of semantic roles will be presented, as well as the basic approach to semantic role annotation of large amounts of corpora. Recent extensions to this ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017